Improving Precision in Information Retrieval for Swedish using Stemming

نویسندگان

  • Johan Carlberger
  • Hercules Dalianis
  • Martin Duneld
  • Ola Knutsson
چکیده

We will in this paper present an evaluation of how much stemming improves precision in information retrieval for Swedish texts. To perform this, we built an information retrieval tool with optional stemming and created a tagged corpus in Swedish. We know that stemming in information retrieval for English, Dutch and Slovenian gives better precision the more inflecting the language is, but precision depends also on query length and document length. Our final results were that stemming improved both precision and recall with 15 respectively 18 percent for Swedish texts having an average length of 181 words.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بررسی تأثیرات ریشه‌یابی در بازیابی اطلاعات در زبان فارسی

Using the language-specific behavior in information retrieval systems can improve the quality of the retrieved results significantly. Part of the word that remains after removing its affixes is called stem. Stemming process can be used for improving the relevancy of the results in information retrieval system. Different morphological variants of words (plural, past tense…) will be mapped into t...

متن کامل

Experiments in 8 European Languages with Hummingbird SearchServer™ at CLEF2002

Hummingbird submitted ranked result sets for all Monolingual Information Retrieval tasks of the Cross-Language Evaluation Forum (CLEF) 2002. Enabling stemming in SearchServer increased average precision by 16 points in Finnish, 9 points in German, 4 points in Spanish, 3 points in Dutch, 2 points in French and Italian, and 1 point in Swedish and English. Accent-indexing increased average precisi...

متن کامل

Lexical and Algorithmic Stemming Compared for 9 European Languages with Hummingbird SearchServerTM at CLEF 2003

Hummingbird participated in the monolingual information retrieval tasks of the Cross-Language Evaluation Forum (CLEF) 2003: for natural language queries in 9 European languages (German, French, Italian, Spanish, Dutch, Finnish, Swedish, Russian and English) find all the relevant documents (with high precision) in the CLEF 2003 document sets. For each language, SearchServer scored higher than th...

متن کامل

Improving Persian Information Retrieval Systems Using Stemming and Part of Speech Tagging

With the emergence of vast resources of information, it is necessary to develop methods that retrieve the most relevant information according to needs. These retrieval methods may benefit from natural language constructs to boost their results by achieving higher precision and recall rates. In this study, we have used part of speech properties of terms as extra source of information about docum...

متن کامل

A Study on the use of Stemming for Monolingual Ad-Hoc Portuguese Information Retrieval

For UFRGS’s first participation in CLEF our goal was to compare the performance of heavier and lighter stemming strategies using the Portuguese data collections for monolingual Ad-hoc retrieval. The results show that the safest strategy was to use the lighter alternative (reducing plural forms only). On a query-by-query analysis, full stemming achieved the highest improvement but also the bigge...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001